Fix notification-delivery crash under load (thread-unsafe gethostbyname)#1937
Merged
Fix notification-delivery crash under load (thread-unsafe gethostbyname)#1937
Conversation
orionldServerConnect used gethostbyname, which on glibc returns a pointer into a single process-wide static hostent buffer. Concurrent notification delivery from multiple MHD threads raced on that shared buffer and could segfault — reproduced reliably by the NOTIFY perf scenario against -mongocOnly. Replace with getaddrinfo, which is POSIX-mandated thread-safe and returns per-call heap storage freed by freeaddrinfo. Behaviour is otherwise equivalent (IPv4, SOCK_STREAM, numeric port via AI_NUMERICSERV).
|
CLA Assistant Lite bot All contributors have signed the CLA ✍️ |
The functional-test workflow ran eclipse-mosquitto:1.6, which only speaks MQTT 3.1.1 and older. Tests that request MQTT-Version "mqtt5.0" (five of them, including ngsild_subCache-with-mongoc-on-startup) get MQTTClient_connect error -15 and an E: trace from mqttConnect. For most of those tests the MQTT connect happens after broker startup, so the harness doesn't care. But ngsild_subCache-with-mongoc-on-startup restarts the broker at step 07, the subCache reloads the MQTT sub, and the connect fails during startup — roughly 2 ms before the listening port opens. Since b83961a the harness's startup gate treats any ^E: on stderr as a dead broker, and GA runner jitter in that 2 ms window turns it into a flake. eclipse-mosquitto:2.0 supports MQTT 3.1.1 and 5.0, which removes the connect failure, the E: trace, and the race. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two small updates to doc/manuals-ld/installation-guide-*: - Ubuntu 18.04 / 20.04 / 22.04: section headers still said "Postgres 12" while the install commands below them already installed postgresql-17 (and CI runs postgis/postgis:17-3.5-alpine). Bump the headers to 17 so the page is internally consistent. - All six guides (Ubuntu 18.04/20.04/22.04, Debian 9/10, CentOS 7): add a one-line note that Orion-LD's MQTT subscription support includes MQTT 5.0, which requires Mosquitto 2.0 or later. On Ubuntu 22.04 the distro package is already 2.x; older distros need the upstream PPA / mosquitto.org repo if MQTT 5.0 is wanted. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous commit bumped the mosquitto service container from
eclipse-mosquitto:1.6 to 2.0 to unblock MQTT 5.0 tests. But the 2.0
Docker image defaults to allow_anonymous=false, so every test that
creates an MQTT subscription now fails with MQTTClient_connect
error -1 (broker closes the unauthenticated connection).
Fix by dropping the service container and starting plain MQTT in the
existing MQTTS step instead, where we already control the config:
- Install mosquitto via apt (Ubuntu 24.04 ships 2.0.18 — supports
MQTT 5.0).
- Stop the systemd-launched instance (its default config also denies
anonymous).
- Launch on 1883 with allow_anonymous=true.
- Keep the existing 8883 MQTTS instance alongside it.
Rename the step from "Start MQTTS broker" to "Start MQTT + MQTTS
brokers" to reflect what it now does.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



orionldServerConnect used gethostbyname, which on glibc returns a pointer into a single process-wide static hostent buffer. Concurrent notification delivery from multiple MHD threads raced on that shared buffer and could segfault — reproduced reliably by the NOTIFY perf scenario against -mongocOnly.
Replace with getaddrinfo, which is POSIX-mandated thread-safe and returns per-call heap storage freed by freeaddrinfo. Behaviour is otherwise equivalent (IPv4, SOCK_STREAM, numeric port via AI_NUMERICSERV).